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ABSTRACT 

Let M be an arbitrary n by n matrix. We study the condi- 
tion number a random perturbation M + A'^„ of M, where 
Nn is a random matrix. It is shown that, under very general 
conditions on M and M„, the condition number of M + Nn 
is polynomial in n with very high probability. The main 
novelty here is that we allow A''„ to have discrete distribu- 
tion. 

1. INTRODUCTION 
1.1 The condition number 

Let M be an n X n matrix, 

(Ti(M) := sup \\Mx\\ 

a;€R,^,||a;|| — 1 

is the largest singular value of A4 (this parameter is also 
often called the operator norm of M). 

If M is invertible, the condition number ii{M) is defined as 

k(M) — (Ji(M)ai(M~^). 

The condition number plays a crucial role in numerical lin- 
ear algebra. The accuracy and stability of most algorithms 
used to solve the equation Mx = h depend on k(M) . The ex- 
act solution X — M~^h, in theory, can be computed quickly 
(by Gaussian elimination, say). However, in practice com- 
puters can only present a finite subset of real numbers and 
this leads to two difficulties. The represented numbers can- 
not be arbitrary large of small, and there are gaps between 
them. A quantity which is frequently used in numerical 
analysis is ^xa&chme which is half of the distance from 1 to 
the nearest represented number. A fundamental result in 
numerical analysis [l] asserts that if one denotes by x the 
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result computed by computers, then the relative error ||^|| 
satisfies 

^^-0(.^^,hine-W) 

We call M well conditioned if k(M) is small. For quan- 
titative purposes, we say that an n by n matrix M is well 
conditioned if its condition number is polynomially bounded 
m n {k{M) < rf for some constant C independent of n). 

In the whole paper, we think that n is large and the asymp- 
totic notation is used under the assumption that n ^ oo. 

1.2 Effect of noise 

An important issue in the theory of computing is noise, as 
almost all computational processes are effected by it. By 
the word noise, we would like to represent all kinds of errors 
occurring in a process, due to both humans and machines, 
including errors in measuring, errors caused by truncations, 
errors committed in transmitting and inputting the data, 
etc. 

It happens frequently that while we are interested in a solv- 
ing a certain equation, because of the noise the computer 
actually ends up with solving a slightly perturbed version 
of it. Our work is motivated by the following phenomenon, 
proposed by Spielman and Teng [5] 

PI: For every input instance it is unlikely that a slight ran- 
dom perturbation of that instance has large condition num- 
ber. 

If the input is a matrix, we can reformulate this in a more 
quantitative way as follows 

P2: Let M be an arbitrary n by n matrix and N„ a random 
n by n matrix. Then with high probability M -\- N„ is well 
conditioned. 

The crucial point here is that M itself may have large condi- 
tion number. The above phenomenon gives an explanation 
to the fact (which has been observed numerically for some 
time-see [8]) that one rarely encounters ill-conditioned ma- 
trices in practice. This is also the core of Spielman- Teng 
smooth analysis which we will discuss in more details in 
Section ID 

The goal of this paper is to show that under very general 
assumptions on M and Nn, M -\- N„ indeed has small con- 
dition number with overwhelming probability. The main 
novelty here is that we allow the random matrix Nn to have 
discrete distribution. This is a natural assumption for ran- 
dom variables involved in digital processes. On the other 



hand, very little has been known, prior to this paper, about 
this case. Random discrete matrices are indeed much more 
difficult to analyze than their continuous counterparts and 
our analysis is significantly different from those used earlier 
for the continuous models. In particular, it relies heavily on 
a new development in additive combinatorics, the so-called 
Inverse Littlewood-Offord theory (see Section [SJ. 

1.3 A necessary assumption 

Suppose that we would like to show that M + Nn is well 
conditioned. This requires to bound both \\M + 7V„|| and 
II (A/ + Nn)~^\\ by a polynomial in n. Let us look at the first 
norm. By the triangle inequality 

l|A/|| - ||iV„|| < ||A/ + 7V„|| < ||Af|| + ||iV„||. 

In most models for random matrices, ||A'^„|| is 0{^Jn) with 
very high probability. Thus ||Af + Nn\\ is often dominated 
by |jAf||. So in order to make k{M + Nn) = n°'^\ it is 
natural to assume that ||Af|| = rfi^^K In fact, as 

n 

\\Mf = al <Y,m% = < nal = n\\M\\\ 

ij i=l 

where rriij are the entries of M , this assumption is equivalent 
to saying that all entries of M are polynomially bounded. 
We will make this assumption about M in the rest of the 
paper. The main task now is to bound the second norm, 
II (A'/ + N„)~^\\, from above. 

2. THE RESULTS 
2.1 Continuous noise 

The case when entries of A'^^j are i.i.d Gaussian random vari- 
ables (with mean zero and variance one) has been studied 
by various authors [3] |8l. In particular, Sankar, Spielman 
and Teng [Sj proved 

Theorem 2.2. Let M be an arbitrary n by n matrix. 
Then for any x > 0, 

P{\\{M + N„y^\\ > x)^0{^). 

X 

It is well known that there are positive constants Ci and C2 
such that P(||A'^„|| > ci^/n) < exp(— C2n). 

Corollary 2.3. Let B > C + 3/2 be positive constants. 
Let M be an arbitrary n by n matrix whose entries have 
absolute value at most . Then 

P(k(A/ + N„) > n^) = 0{n-''+'^+'^^^). 

Proof. By the assumption on M and the fact about 
||iV„||, ||A'/-fAf„|| = 0(n^+^) with probability l-exp(-!:!(n)). 
By Theorem [2T^ ||(A/ -|-iV„)"^|| < n^""^"^ with probabil- 
ity 0(n"^+'^+^''^). Thus the claim follows by the union 
bound. □ 

2.4 Discrete noise: Bernoulli case 

Let us now consider random variables with discrete sup- 
ports. By rescaling, we can assume that their supports lie 
on Z (or Z"^ for some d). The most basic model among ran- 
dom discrete matrices is the Bernoulli matrix, whose entries 



are i.i.d Bernoulli random variables (taking values —1 and 
1 with probability 1/2). 

Bounding the norm of the inverse of a random discrete ma- 
trix is a difficult task, and the techniques used for the con- 
tinuous case are no longer applicable. In fact, it is already 
not trivial to prove that a random Bernoulli matrix is al- 
most surely invertible. Efficient bounds on the norm of the 
inverse of a Bernoulli random matrix were obtained only 
very recently (T] [12] . 

Our first result here is the discrete analogue of Theorem 
12.21 where the Gaussian noise is replaced by the Bernoulli 
noise. 

Theorem 2.5. For any constants A and C there is a con- 
stant B such that the following holds. Let M be an integer n 
by n matrix whose entries (in absolute values) are bounded 
from above by and Nn be the n by n random Bernoulli 
matrix. Then 

P(||(Af + iV„)"'|| > n^) <n-'*. 

Corollary 2.6. For any constants A and C there is a 
constant B such that the following holds. Let M be an ar- 
bitrary n by n matrix whose entries (in absolute values) are 
bounded from above by n'^' and N„ be the n by n random 
Bernoulli matrix. Then 

P{k{M -\- Nn)\\ > n^) < n"^. 

Remark 2.7. It is useful to have the right hand side be 
rather than just o(l). The reason is that in certain 
applications (see for instance Section^, we need to show 
that polynomially many matrices have, simultaneously, small 
condition numbers. The bound guarantees that we can 
achieve this by a straightforward union-bound argument. 

Theorem 12.51 is a special case of a general theorem, which, 
among others, asserts that the same conclusion still holds 
when we replace the Bernoulli random variable by arbitrary 
symmetric random discrete variables. We present this the- 
orem in the next subsection. 

2.8 Arbitrary discrete noise 

Notation. For a real number x, we use e{x) to denote 
exp(27ria;) = cos 2tix + i sin 2-kx. 

Definition 2.9. Let /i < 1/2 and D be positive con- 
stants. A random variable ^ is (fj,, D) -bounded if there is 
an integer 1 < k < D such that for any t 

E(e(^t))| < (1 - At) -l-/icos27rfci. 

A random vector (matrix) is (n, D) -bounded if its coordi- 
nates (entries) are independent (jj,, D)-bounded random vari- 
ables. 

Remark 2.10. We need to assume fi < 1/2 to guarantee 
that (1 — ^) -I- iJ.cos2-Kt is non-negative for all t. 

Theorem 2.11. For any positive constants fi < 1/2, A,C 
and D there is a constant B such that the following holds. 
Let M be a fixed integer n by n matrix whose entries have 
absolute values most n'^' . Let N„ be an n by n {fi,D) - 
bounded random matrix whose entries have absolute values 
at most n^' (with probability one). Then 

P{an{M + Nn) < n"^) < n"^. 



Remark 2.12. It is useful to note that the entries of Nn 
are not required to have the same distribution. This allows 
the possibility that the noise at a certain location has a cor- 
relation with the corresponding entry of the original matrix 
M . For instance, it might be natural to expect that the noise 
occurring to a larger entry of M have larger variance. 

The following lemma provides a sufficient condition for {fi, D)- 
boundedness. 

Lemma 2.13. Let ^ be a symmetric discrete random vari- 
able and assume that there is a positive integer s such that 
= s) > e. Then ^ is (e/2, 2s)-bounded. 

Proof. (Proof of Lemma I2.13|) By the symmetry of ^ 
and the triangle inequality 

|E(e(Ct))i = j ^ P(^ = m)cos27rmf| < (l-2e) + |2ecos27rst]. 

m— — cxD 

Using the elementary inequahty j cosa;| < | + j cos 2a: with 
X — 2TTst, we have 

(1 - 2e) + j2ecos27rsii < (1 - |) + | cos47rst, 

concluding the proof. □ 

With this lemma, one can easily check that most basic vari- 
ables are (/i, _D)-bounded for some constants /i and D. Let 
us list a few examples; 

• (Bernoulli) ^ is 1 or —1 with probability 1/2. We can 
take e — 1/2 and s — 1. 

• (Lazy coin flip) ^ = with probability 1 — a and 1 or 
— 1 with probability a/2. We can take e — a/2 and 
s = 1. 

• (Discretized Gaussian) Define ^ as follows: P(^ — 
m) = P(m— 1/2 < H < m + 1/2), where H is standard 
Gaussian. We can take e = P(l/2 < H < 3/2) and 
s = 1. 

• As a generalization of the previous example, one can 
consider the discretization of any symmetric random 
variable. 

2.14 The general result 

Now we are going to present an even more general result, 
which implies Theorem 1 2.111 In this result, we do not require 
that the entries of the random matrix be independent. 

Definition 2.15. Let /i < 1/2 and C,K be positive con- 
stants. A random vector X of length n is said to be of type 
{l^,C,K) if 

• (boundedness) With probability one, all coordinates of 
X are integer with absolute value at most . 

• (non- degeneracy) For any unit vector y, P(|X ■ y\ < 
n~^) < 1 — At/2. (This means that X is not concen- 
trated near a hyperplane.) 

• (concentration) There are positive integers ai,...,a„ 
with lcm{ai , . . . , Om) < such that for any vector 
V G Z", 



sup P(X • i; = a) < / If ((1 — /i) + M cos 27raiUit)i9i, 

aeZ Jo 

(1) 

where lcm(a-i . . . ,am) (least common multiple) is the 
smallest positive integer divisible by all Oi . 

Remark 2.16. Here and later, one should not take the 
absolute constants such as —2 and 2 too seriously. We make 
no attempt to optimize these constants. The first two con- 
ditions in the definition are quite intuitive. The third and 
critical condition comes from Fourier analysis and the reader 
will have a better understanding of it after reading the next 
section. 

Definition 2.17. A collection ofn random vectors Y\, . . . ,Yr. 
in R" is strongly linearly independent if for any non-zero 
vector y G R" and any 1 < i < n, 

P(yi, . . . ,Yn independent\Yi = y) < exp(— r2(n)). 

Theorem 2.18. (Main Theorem) For every positive con- 
stants fi < 1/2, A, C,K there is a positive constant B such 
that the following holds. Let M„ be a random matrix with 
the following two properties 

• The row vectors of M„ are independent random vectors 
of type {fi,C,K). 

• The column vectors of Mn are strongly linearly inde- 
pendent. 

Then 

P(cr„(M„) < n-^) < n"-*. 

Remark 2.19. Actually in the concentration property, one 
can omit a few coordinates in the product. To be more pre- 
cise, we can make the following weaker assumption: 

• There is a subset E of {I, ... ,n} of at most n'^^ el- 
ements and positive integers ai, i £ {l...n}\E with 
Icm at most such that for any vector t; G Z", 



sup P(X-i; = a) < / ((1— /i)+/i cos 27rai«it)9t, 

(2) 

Remark that we do not require any control on the coordi- 
nates in E. This allows us to handle, for instance, the case 
when there are frozen entries which are not effected by noise. 
(In this case we simply put these coordinates in E.) This sit- 
uation does occur in practice. In particular, a zero entry is 
often noise-free. 

3. PROOF OF THEOREM 2.11 

In order to derive Theorem 12.111 from Theorem 12.181 we 
first need to verify that the matrix in Theorem 12.111 is of 
type {p., C, K) for some constants ^, C and K. This will be 
done in the first two subsections. Next, we need to verify 
the strong linear independence. This will be done in the last 
subsection. 



3.1 Checking the concentration property 

In this subsection, we verify the concentration property in 
the definition of {jj,, C, A")-type. This is based on the follow- 
ing lemma. 

Lemma 3.2. Let Z be an arbitrary integer vector and X 
be a random {fi, D) -bounded vector, both of length n. Then 
there exist positive integers ai, . . . , a„ at most D such that 
for any vector « G Z" 

supP((Z + X) ■ u = a) < / W[{1- + ^ cos 2-KaiVit)dt. 
aez Jo 

Proof. As a can take any value, it suffices to prove the 
statement for Z = 0. For an integer x, the indicator 1^=0 of 
the event x = can be expressed, using Fourier analysis, as 

ix=o = / e{xt)dt. 
Jo 

Let ^i, 1 < i < n be the coordinates of X. The event 
X ■ V = a can be rewritten as X^T^i — a = 0. Thus 

ai ^ 
e{Y,^.^v^~a)t)^ty 
1=1 

As the are independent, the last expectation is equal to 

/ exp(-27rat)]^Ee(^,«,t)at < / Y[\E,{e{(,mt))\dt. 
Jo Jo 

As is (/i, Z))-bounded, there is a positive integer Oi < D 
such that 

\E{exp{2ni^iVit)\ < (1 — A*) + /J- cos 2naiVit, 
completing the proof. □ 

3.3 Checking the non-degeneracy property 

Let y be a unit vector in R" and X be a random (/i, D)- 
bounded vector of length n and Z be an arbitrary integer 
vector of length n. We want to show that 

P{\{Z + X)-y\< n"^) < 1 - (1/2. 

If {Z + X)-y has absolute value at most , then X-ny has 
absolute value at most . As y is an unit vector, one of the 
coordinate of ny has absolute value larger than 1. Assume, 
without loss of generality, that the first coordinate yi of ny 
is such large. Recall that X = (^i, . . . , where the are 
independent (/i, Z))-bounded random variables. Condition 
on ^2 , . ■ • , ) it suffices to show that for any interval / of 
length 2n~^ 

P(Ci^^i e 1) < i-m/2. 

But since ^ take only integer values and \yi\ > 1, the values 
of ^lyi would be at least one apart. Assume, for a contra- 
diction, that P(^i?;i €/)>! — /i/2. This would imply that 
there is a number s such that P(fi = s) > 1 — ^/2. Then 
by the triangle inequality 

|E(etet))l > \e{st)\{l - m/2) - m/2 > 1 - m, 

for any t. On the other hand, as is (/i, Z))-bounded 

|E(e(Cit))l < (1 - m) +MCOs27rait 

for some ai < D. Taking t such that cos27rait = —1, we 
obtain a contradiction and conclude the proof. 



3.4 Checking the strong linear independence 

The strong linear independence of the column vectors of 
a random (/x, _D)-bounded matrix is a consequence of the 
following theorem, which can be proved by refining the proof 
of [ni Theorem 1.6]. 

Theorem 3.5. Let fi < 1/2 and D,l be positive con- 
stants. Then there is a positive constant e = e(/i, D, I) such 
that the following holds. For any set Y of I independent vec- 
tors from R" and n — I independent random D) -bounded 
vectors of length n, the probability that they are linearly de- 
pendent is at most (1 — e)". 

Remark 3.6. This theorem is a generalization of a well 
known theorem of Kahn, Komlos and Szemeredi '51 which 
asserts that the probability that a random Bernoulli matrix 
is singular is exponentially small. To see this, recall that a 
random Bernoulli vector is (1/4, 2)-bounded and in Theorem 
13.51 take I = 1 and fix y be the all one vector. 

4. SMOOTH COMPLEXITY WITH DISCRETE 
NOISE 

Running times of algorithms are frequently estimated by 
worst-case analysis. But in practice, it has been observed 
that many algorithms perform significantly better than the 
estimates obtained from the worst-case analysis. Few years 
ago, Spielman and Teng [9l [10] came up with an ingenu- 
ous explanation for this fact. The rough idea behind their 
argument is as follows. Even if the input I is the worst- 
case one (which, in theory, would require a long running 
time), because of the noise, the computer actually works on 
some slightly randomly perturbed version of /. Next, one 
would show that the running time on a slightly randomly 
perturbed input, with high probability, is much smaller than 
the worst-case one. The smooth complexity of an algorithm 
is the maximum over its input of the expected running time 
of the algorithm under slight perturbations of that input. 
The puzzling question here is, of course: why the perturbed 
input is typically better than the original (worst-case) one 
? In some sense, the "magic" lies in the Phenomenon PI. 
The random noise guarantees that the condition number of 
the perturbed input is small (so the perturbed input is likely 
to be well conditioned), no matter how ill conditioned the 
original input may be. The bound on the condition number 
then can be used to derive a bound on the running time of 
the algorithm. 

In their works [S] 1101 |^, Spielman and Teng (and coau- 
thors) assumed Gaussian noise (or more generally continu- 
ous noise) . Theorem 12.21 played a significant role in their 
proofs. 

An important (and largely open) problem is to obtain smooth 
complexity bounds when the noise is discrete. (We would 
like to thank Spielman for communicating this problem.) 
In fact, it is not clear how computers would compute with 
Gaussian (and other continuous) distributions without dis- 
cretizing them. This problem seems to pose a considerable 
mathematical challenge. Naturally, the first step would be 
to obtain estimates for the condition number with discrete 
noise. This step has now been accomplished in this paper. 
However, these estimates themselves are not always suffi- 
cient. To be more specific, the situation looks as follows: 

• There are problems where an efficient bound on the 
condition number leads directly to an efficient com- 



plexity bound. In such a situation, we obtain a smooth 
complexity bound with discrete noise in the obvious 
manner. This seems to be the case, e.g., with the prob- 
lems involving the Gaussian Elimination in [8*. In the 
proofs in '8] , the critical fact was that all n — 1 minors 
of a random perturbed matrix are all well conditioned, 
with high probability. This can be obtained using our 
results combined with the union bound (see the remark 
after Theorem 1 2. 5 |l . 

• There are situations where beside the estimate on the 
condition number, further properties of the noise is 
used. An important example is the simplex method in 
linear programming. In the smooth analysis of this al- 
gorithm with Gaussian noise [10] , the fact that the dis- 
tribution is continuous was exploited at several places. 
Thus, even with the discrete version of the condition 
number estimates in hand, it is still not clear to us 
how to obtain a smooth complexity bound with dis- 
crete noise in this problem. 

5. KEY INGREDIENTS 

In this section, we present our key ingredients in the proof 
of Theorem [2T5] 

5.1 Generalized arithmetic progressions and 
their discretization 

One should take care to distinguish the sumset kA from the 
dilate k ■ A, defined for any real k as 

k- A:= {ka\a £ A}. 

Let P be a GAP of integers of rank d and volume V. Our 
first key ingredient is a theorem that shows that given any 
specified scale parameter _Ro, one can "discretize" P near 
the scale _Ro. More precisely, one can cover P by the sum 
of a coarse progression and a small progression, where the 
diameter of the small progression is much smaller (by an ar- 
bitrarily specified factor of S) than the spacing of the coarse 
progression, and that both of these quantities are close to 
Ro (up to a bounded power of SV). 

Theorem 5.2 (Discretization). fWj For every con- 
stant d there is a constant d' such that the following hold. 
Let P d Ti be a symmetric generalized arithmetic progres- 
sion of rank d and volume V . Let Ro,S be positive integers. 
Then there exists a number R> 1 and two generalized pro- 
gressions Psmaii; Psparsc of rational numbers with the follow- 
ing properties. 

• (Scale) We have R < {SV/ Ro. 

• (Smallness) Psmaii has rank at most d, volume at most 
V, and takes values in [—R/ S, R/ S\. 

• (Sparseness) Psparso has rank at most d, volume at 
most V , and any two distinct elements of SPsparsc are 
separated by at least RS. 

• ( Covering) We have P C Psmaii + Psparsc . 

5.3 Inverse Little wood- Off ord theorem 

Our second key ingredient is a theorem which character- 
izes all sets V = {vi ,...,«„} such that YYi=i ((1 ~ m) + 
fj, cos 2-KVit^dt is large. This theorem is a refinement of [121 



Theorem 2.5] (see Remark 2.8 from this paper) and will 
enable us to exploit the non-concentration property from 
Definition 12 . 1 5 1 in a critical way. 

Theorem 5.4. Let < /i < 1 and A,a > be arbitrary. 
Then there is a positive constant A' such that the following 
holds. Assume that "v — {vi, . . . ,Vn} is a multiset of integers 
satisfying 

Then there is a GAP Q of rank at most A' and volume 
at most which contains all but at most elements of 
V (counting multiplicity). Furthermore, there is a integer 
1 < s < such that su £ v for each generator u of Q. 

With the two key tools in hand, we are now ready to prove 
Theorem [2T8] 

6. PROOF OF THEOREM 2.18 

Let B > 10 be a large number (depending on the type of 
M„) to be chosen later. If a„M„ < then there exists a 
unit vector v such that 

IIAfnwII < n^^ . 

By rounding each coordinate v to the nearest multiple of 
n^^^^, we can find a vector v G n~^~^ ■ 7i" of magnitude 
0.9 < ||C|| < 1.1 such that 

\\Mni\\ < 2n"-^. 

Writing w := n^'^'^v, we thus can find an integer vector 
w e Z" of magnitude .9n-^+^ < ||m|| < l.ln^+2 such that 

||M„w|| < 2n^ 

Let Q. be the set of integer vectors G Z" of magnitude 
.9n^+^ < < l.ln^+^ It suffices to show the probability 
bound 

P(there is some to £ SI such that ||M„w|| < 2n^) < n~'*. 

We now partition the elements w = {w\, . . . , Wn) of Q. into 
three sets: 

• We say that w is rich if 

sup P(Xi • ui = a) > n~^~'^ , 

agZ,l<i<n 

where Xi are the row vectors of M„. Otherwise we say 
that w is poor. Let fii be the set of poor w's. 

• A rich w is singular w if fewer than n"'^ of its coordi- 
nates have absolute value n^^^ or greater. Let be 
the set of rich and singular w's. 

• A rich w is non-singular w, if at least n"'^ of its co- 
ordinates have absolute value greater. Let fJs 
be the set of rich and non-singular w's. 

Remark 6.1. Again one should not take the absolute con- 
stants —4, 1/2 and .2 too seriously. 

The desired estimate follows directly from the following lem- 
mas and the union bound. 



Lemma 6.2 (Estimate for poor w). 
^{there is some w £ Qi such that ||A/„w|| < 2n^) = o{n~^). 

Lemma 6.3 (Estimate for rich singular w). 
F{there is some w £ Q2 such that ||A/„w|| < 2n^) = o{n~^). 

Lemma 6.4 (Estimate for rich non-singular w). 
F{there is some w £ such that ||A'/*ui|| < 2n^) — o{n~^). 

The proofs of Lemmas 16. 21 and lG. 31 are relatively simple and 
rely on well-known methods. The proof of Lemma r6.4l which 
is essentially the heart of the matter, is more difhcult and 
requires the tools provided in Section [S] 

7. PROOF OF LEMMA 7.2 

We use a conditioning argument, following 7 . (An argu- 
ment of the same spirit was used by Komlos to prove the 
bound 0{n~^^^) for the singularity problem Let M be 
a matrix such that there is lu G fii satisfying ||Afw|| < 2n^. 
Since and its transpose have the same spectral norm, 

there is a vector w' which has the same norm as w such that 
||w'M|| < 2n^. Let u — w' M and Xi be the row vectors of 
M. Then 

n 

u = ^ w'iXi 

where w[ are the coordinates of w' . Now consider M — Mn- 
By paying a factor of n in the probability (whenever this 
phrase is used, keep in mind that we will use the union 
bound to conclude the proof), we can assume that w'„ has 
the largest absolute value among the w'i . We expose the first 
n—1 rows Xi, . . . , X„-i of Af„. If there is ui £ f2i satisfying 
IjAfwII < 2n^, then there is a vector y € Qi, depending only 
on the first n—1 rows such that 

n-l 

{J2{X.-yrr^'<2n'. 
We can write X„ as 

1 

X„ = — (u - w'iXi). 

Thus, 

1 ""^ 
\X„ ■ y\ = ^-ttI^ ■ ~ ■ y\- 

\w'„\ ^ — ' 

The right hand side, by the triangle inequality, is at most 

^(l^lly| + lh'll(E(^«-y)')''")- 

By assumption \w'„\ > n~^^^\w'\. Furthermore, as \u\ < 
2n^, \u\\y\ < 2n?\y\ < 3n^|w'| as \w'\ — \w\ and both y and 
w belong to Qi. (Any two vectors in f7i has roughly the 
same length.) Finally {Y.7^i{^i ■ vf Y^"^ < Putting 
all these together, we have 

\X„ ■ y\ < 5n"/'. 



Recall that both X„ and y are integer vectors, so X„ ■ y is 
an integer. The probability that |X„ • y\ < 5n^^^ is at most 

(10n^/^-hl)supP(X„-ye/). 

On the other hand, y is poor, so by definition sup^g2 ^i^n ■ 
y = a) < n~^~*. Thus, it follows that 

P(there is some w £Q,\ such that ||A/„ui|| < 2ri?) < 

< n-^-*(10n^'''' + l)n = o(n~'^), 

where the extra factor n comes from the assumption that w'„ 
has the largest absolute value. This completes the proof. 

8. PROOF OF LEMMA 7.3 

We use an argument from 6 . The key point will be that 
the set Q2 of rich non-singular vectors has sufficiently low 
entropy that one can proceed using the union bound. A set 
N of vectors on the n-dimensional unit sphere S„-i is said 
to be an e-net if for any x £ Sn-i, there is y G TV such that 
~ y\\ < e. A standard greedy argument shows 

Lemma 8.1. For any n and e < 1, there exists an e-net 
of cardinality at most 0(l/e)". 

We need another lemma, showing that for any unit vector 
y, very likely is polynomially large. 

Lemma 8.2. For any unit vector y 

P(llM„y|| < n-^) = exp(-f7(n)). 

Proof. If ||Af„y|| < n"^, then \Xi-y\ < n"^ for all index 
1 < i < n. However, by the assumption of the theorem, for 
any fixed i , the probability that \Xi ■ y\ < is at most 
1 - ti/2. Thus, 

P(||A/„y|| < n"') < (1 - ^l/2Y = exp(-t^(n)) 

concluding the proof. □ 

For a vector w £ Q.2, let w' be its normalization w' : 
Thus, w' is an unit vector with at most n°'^ coordinates with 
absolute values larger or equal n~^^^. By choosing B > 
2C-I-20, we can assume that w' belong to 0,2, the collection 
of unit vectors at most nP'^ coordinates with absolute values 
larger or equal n~'^~^^. If |lA/w|| < 2n^ for some w £ 
0.2, then \\Mw'\\ < 3n"^ , as > .9n^+^ Thus, it 

suffices to give an exponential bound on the event that there 
is w' £ 0,2 such that ||Af„w'|| < 3n~^. By paying a factor 
of („o.2) = exp(o(n)) in probability, we can assume that the 
large coordinates (with absolute value at least n-^-iO) are 
among the first / ~ n'^'^ coordinates. Consider an n~'^~^- 
net A'^ in Si-i. For each vector y £ N, let y' be the n- 
dimensional vector obtained from y by letting the last n — I 
coordinates be zeros, and let A^' be the set of all such vectors 
obtained. These vectors have magnitude between 0.9 and 
1.1, and from Lemma 18.11 we have \N'\ < 0(n^)'. Now 
consider a rich singular vector w £ 0,2 and let w be the 
i-dimensional vector formed by the first I coordinates of this 
vector. As the remaining coordinates are small ||«; || = 
1 -I- 0{n~'-^'^). There is a vector y £ N such that 

\\y-w"\\ < n"^"^ + 0(n-'^-'*). 
It follows that there is a vector y' £ N' such that 
\\y' - w'W < n-^-'' + 0(n~'^~^) < 2n-'^-\ 



If M has norm at most 72*^+^, then 

\\Mw'\\ > \\My'\\ -2n-^-'^n^+' = \\My'\\ - 2n-*. 

It follows that if \\Mw'\\ < 3n"^ for some B > 2, then 
\\My'\\ < Sn"". Now take M = M„. For each fixed y' , 
the probability that ||A/„j/'|| < 5n~'^ < is at most 

exp(— f2(n)), by Lemma [8. 21 Furthermore, the number of y' 
is subexponential (at most 0(n'-^^^yO{n)^" = exp(o(n))). 
The claim follows by the union bound. 

9. PROOF OF LEMMA 7.4 

This is the most difficult part of the proof, where we will 
need all the tools provided in Section [5] Informally, the 
strategy is to use the inverse Littlewood-Offord theorem to 
place the integers wi, . . . , Wn in a progression, which we then 
discretize using Theorem 15.21 This allows us to replace the 
event ||M„u)|| < 2n^ by some dependence event involving 
the columns of M„, whose probability is very small by the 
strong linear independence assumption of the theorem. 

We now turn to the details. By the inverse theorem and 
the non-concentration property from Definition 12.151 there 
is a constant A' such that for each w £ ^Ig there exists a 
symmetric GAP Q of integers of rank at most d and vol- 
ume at most and non-zero integers ai , . . . , a„ with least 
common multiple at most such that Q contains all but 

[n^'^J of the integers aiWi , . . . , a„ui„. Furthermore, the gen- 
erators of Q are of the form aiWi/s for some 1 < s < ■ 
Notice that if UiWi G Q then Wi G Q' := {x/a\x £ Q,a £ 
Z, a 7^ 0, |a| < n^}. Using the description of Q and the fact 
that w\, . . . , Wn and ai , . . . , a„ are polynomially bounded 
in n, one can see that the total number of possible Q is 
•nP^^^ = exp(o(n)). Next, by paying a factor of 

(lJ-Ij) ^'^^""'^ =exp(o(n)) 

we may assume that it is the last [n" ""^] integers am+iWm+i, 
. . . , a„Wn which possibly lie outside Q, where we set m := 
n — [n^'^J. As each of the Wi has absolute value at most 
l.ln^"'"^, the number of ways to fix these exceptional el- 
ements is at most (2.2n^"'"^)"' — exp(o(n)). Overall, it 
costs a factor only exp(o(n)) (keep in mind that we intend 
to use the union bound) to fix Q, the positions and values 
of the exceptional elements of w. 

Notice that M„w = wiVi + . . . w„Yn, where Yi is the ith col- 
umn of M„. Fixing Wm-i-i, . . . , and set Y ~ J27=m+i 
This way we can rewrite MnW as 

MnW = WlYl + ...+ WmYm + Y. 

For any number y, define Fy be the event that there exists 
w\, . . . , Wm in the set Q' , where at least one of the Wi has 
absolute value larger or equal n^"^", such that 

\wiYi + . . . + WmYm + y\ < 2n^. 

It suffices to prove that for any y 

P{Fy) < exp{-n{n)). 

We now apply Theorem [5:^ to the GAP Q with Ro ■- n^^^ 
and S ■- n^, where L = C + K + 2 {C and K are the 
constants in Definition I2.15|l . By choosing B sufficiently 
large, we can guarantee that _B/3 is considerably larger than 



L. Recall that the volume of Q is at most , where A' is 
a constant depending on A and fi. We can find a number 
R = u^^^+'^a'.lW g^jj(j symmetric GAPs Qaparae, Qsmaii of 
rank at most d' = d'{d,A') and volume at most such 
that 

^ Q ^ Qsparsc ^5small . 

• Qsmaii C [~n-^R,n-^R]. 

• The elements of n^Qsparsc are n^ii-separated. 

Since Q (and hence n^Q) contains aiWi, . . . , amWm (for 
some set {ai, . . . , am}) we can therefore write 

— 1/ sparse I small \ 
Wj = flj (W/ + Wj ) 

for all 1 < j < m, where e Qsparso and e 

Qsmaii- In fact, this decomposition is unique. Suppose that 
the event Fy holds. Let y = (j/i, . . . , yn) and rji,j denote the 
entry of Mn at row i and column j. We have 

Wir/i,! + . . .+ WmTji.m = Hi + 0{n^). 

for all 1 < i < n. Split the wj into sparse and small compo- 
nents and estimating the small components. The contribu- 
tion coming from the small components is 

m 

E-1 small ^/ -L + C + 1 r>\ 

o-j Wj r)ij = 0{n ^ ^ R) 

i=i 

since \ rii,j \ are bounded from above by bounded 
from above by n~^R and aj are positive integers. By the 
triangle inequality, it follows that 

— 1 sparse I I —1 sparse , — L + C + l rj\ 

Wj^ riiA + ... + am r/i^m = yi + 0(n R) 

for all 1 < i < n. 

Set T := lcm{ai, . . . , am)- The previous estimate implies 

b sparse I I 7 sparse m i /^/m — L-\-C-\-l rj\ 

llfl ?7i,l + . . . + bmWrK 7),,m = Tyi + OiTu ^ ^ R) 

where hi = T /ai. Now we use the assumption that T < 
from Definition 12.151 This assumption yields that bi < 
and the left-hand side lies in 

Tl (q/sparse ^ ^ sparse C_ /T- 

which is known to be n^_R-separated. Furthermore, 

0{Tn-^"''^+'-R) = 0(n^-^+'^+'i?) = 0{n^-^) 

by the definition of L. Thus there is a unique value for the 
right-hand side, call it y^, which depends only on y and Q 
such that 

& sparse . . i sparse / 

1^1 r]iA + . . . bmWm Vi,m=yi- 

The point is that we have now eliminated the 0() errors, and 
have thus essentially converted the singular value problem 
to a problem about dependence. Note also that since one of 
the wi, . . . ,Wm is known to have magnitude at least n^^^ 
(which will be much larger than n'^R = n'^^^^^ given that 
we set B > 6L = 6(C + K + 2)), we see that at least one of 
the ujf'^", . . . , is non-zero. 

Let y' = {y'l, . . . , y'n). The equation 

6 sparse . . i sparse / 

iWi rii^l + . . . + OmWm r/i^m = Vi 



implies that the first m columns of M„ span y' . For any fixed 
non-zero y' , the probability that this happens is exponen- 
tially small by the strong linear independence assumption. 
This completes the proof. 

10. FROZEN ENTRIES 

We now give an explanation to Remark 12. 191 This remark 
is based on the fact that in the previous proof one is allowed 
to have as many as n^'" coordinates outside the set Q', for 
any positive constant e < 1. Indeed, these extra coordinates 
contribute a factor of {^"-e) which is exp(o(n)). This factor 
will be swallowed by the exponential bound we have at the 
end of the proof. (In the proof we, for convenience, set e = .9 
and have n'^ exceptional coordinates, but the actual value 
of e plays no role.) The main point here is that we can 
set aside the "frozen" coordinates even before applying the 
Inverse Littlewood-Offord theorem. 
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